学习分解的表示形式需要监督或引入特定模型设计和学习限制作为偏见。Infogan是一个流行的分离框架,通过最大化潜在表示及其相应生成的图像之间的相互信息来学习无监督的分解表示形式。通过引入辅助网络和潜在回归损失的培训来实现共同信息的最大化。在这篇简短的探索性论文中,我们研究了希尔伯特 - 史密特独立标准(HSIC)的使用,以近似潜在表示和图像之间的相互信息,称为HSIC-INFOGAN。直接优化HSIC损失可以避免需要额外的辅助网络。我们定性地比较了每个模型中的分离水平,提出了一种调整HSIC-INFOGAN超参数的策略,并讨论了HSIC-INFOGAN在医疗应用中的潜力。
translated by 谷歌翻译
由于成本限制,减少医学图像分割中密集注释的面具的需求很重要。在本文中,我们考虑仅通过使用图像级标签进行训练来推断脑病变的像素级预测的问题。通过利用生成扩散概率模型(DPM)的最新进展,我们综合了“如果不存在X病理学,患者将如何出现?”。观察到的患者状态与健康反事实之间的差异图像可用于推断病理位置。我们产生的反事实是对应于输入的最小变化,以使其转化为健康域。这需要在DPM中使用健康和不健康的数据进行培训。我们通过通过隐式指导以及注意力条件而不是使用分类器来操纵生成过程来改善以前的反事实DPM。代码可在https://github.com/vios-s/diff-scm上找到。
translated by 谷歌翻译
培训医学图像分割模型通常需要大量标记的数据。相比之下,人类可以迅速学会从医学(例如MRI和CT)图像中准确地识别出有限的指导性解剖学。这种识别能力可以很容易地推广到来自不同临床中心的新图像。这种快速且可普遍的学习能力主要是由于人脑中图像模式的组成结构所致,该图像模式在医学图像分割中较少纳入。在本文中,我们将人类解剖结构的组成成分(即模式)建模为可学习的von-mises-fisher(VMF)内核,它们对从不同领域(例如临床中心)收集的图像具有鲁棒性。图像特征可以分解为具有组成操作的组件(或由)组成的组件,即VMF可能性。 VMF的可能性证明了每个解剖部分在图像的每个位置的可能性。因此,可以根据VMF的可能性预测分割掩模。此外,使用重建模块,未标记的数据也可以通过重新组合重建输入图像来学习VMF内核和可能性。广泛的实验表明,所提出的VMFNET在两个基准上实现了改善的概括性能,尤其是在注释有限的情况下。代码可在以下网址公开获取:https://github.com/vios-s/vmfnet。
translated by 谷歌翻译
甚至在没有受限,监督的情况下,也提出了甚至在没有受限或有限的情况下学习普遍陈述的方法。使用适度数量的数据可以微调新的目标任务,或者直接在相应任务中实现显着性能的无奈域中使用的良好普遍表示。这种缓解数据和注释要求为计算机愿景和医疗保健的应用提供了诱人的前景。在本辅导纸上,我们激励了对解散的陈述,目前关键理论和详细的实际构建块和学习此类表示的标准的需求。我们讨论医学成像和计算机视觉中的应用,强调了在示例钥匙作品中进行的选择。我们通过呈现剩下的挑战和机会来结束。
translated by 谷歌翻译
In this work, we address the problem of unsupervised moving object segmentation (MOS) in 4D LiDAR data recorded from a stationary sensor, where no ground truth annotations are involved. Deep learning-based state-of-the-art methods for LiDAR MOS strongly depend on annotated ground truth data, which is expensive to obtain and scarce in existence. To close this gap in the stationary setting, we propose a novel 4D LiDAR representation based on multivariate time series that relaxes the problem of unsupervised MOS to a time series clustering problem. More specifically, we propose modeling the change in occupancy of a voxel by a multivariate occupancy time series (MOTS), which captures spatio-temporal occupancy changes on the voxel level and its surrounding neighborhood. To perform unsupervised MOS, we train a neural network in a self-supervised manner to encode MOTS into voxel-level feature representations, which can be partitioned by a clustering algorithm into moving or stationary. Experiments on stationary scenes from the Raw KITTI dataset show that our fully unsupervised approach achieves performance that is comparable to that of supervised state-of-the-art approaches.
translated by 谷歌翻译
In the last years, the number of IoT devices deployed has suffered an undoubted explosion, reaching the scale of billions. However, some new cybersecurity issues have appeared together with this development. Some of these issues are the deployment of unauthorized devices, malicious code modification, malware deployment, or vulnerability exploitation. This fact has motivated the requirement for new device identification mechanisms based on behavior monitoring. Besides, these solutions have recently leveraged Machine and Deep Learning techniques due to the advances in this field and the increase in processing capabilities. In contrast, attackers do not stay stalled and have developed adversarial attacks focused on context modification and ML/DL evaluation evasion applied to IoT device identification solutions. This work explores the performance of hardware behavior-based individual device identification, how it is affected by possible context- and ML/DL-focused attacks, and how its resilience can be improved using defense techniques. In this sense, it proposes an LSTM-CNN architecture based on hardware performance behavior for individual device identification. Then, previous techniques have been compared with the proposed architecture using a hardware performance dataset collected from 45 Raspberry Pi devices running identical software. The LSTM-CNN improves previous solutions achieving a +0.96 average F1-Score and 0.8 minimum TPR for all devices. Afterward, context- and ML/DL-focused adversarial attacks were applied against the previous model to test its robustness. A temperature-based context attack was not able to disrupt the identification. However, some ML/DL state-of-the-art evasion attacks were successful. Finally, adversarial training and model distillation defense techniques are selected to improve the model resilience to evasion attacks, without degrading its performance.
translated by 谷歌翻译
Cybercriminals are moving towards zero-day attacks affecting resource-constrained devices such as single-board computers (SBC). Assuming that perfect security is unrealistic, Moving Target Defense (MTD) is a promising approach to mitigate attacks by dynamically altering target attack surfaces. Still, selecting suitable MTD techniques for zero-day attacks is an open challenge. Reinforcement Learning (RL) could be an effective approach to optimize the MTD selection through trial and error, but the literature fails when i) evaluating the performance of RL and MTD solutions in real-world scenarios, ii) studying whether behavioral fingerprinting is suitable for representing SBC's states, and iii) calculating the consumption of resources in SBC. To improve these limitations, the work at hand proposes an online RL-based framework to learn the correct MTD mechanisms mitigating heterogeneous zero-day attacks in SBC. The framework considers behavioral fingerprinting to represent SBCs' states and RL to learn MTD techniques that mitigate each malicious state. It has been deployed on a real IoT crowdsensing scenario with a Raspberry Pi acting as a spectrum sensor. More in detail, the Raspberry Pi has been infected with different samples of command and control malware, rootkits, and ransomware to later select between four existing MTD techniques. A set of experiments demonstrated the suitability of the framework to learn proper MTD techniques mitigating all attacks (except a harmfulness rootkit) while consuming <1 MB of storage and utilizing <55% CPU and <80% RAM.
translated by 谷歌翻译
Stress has a great effect on people's lives that can not be understated. While it can be good, since it helps humans to adapt to new and different situations, it can also be harmful when not dealt with properly, leading to chronic stress. The objective of this paper is developing a stress monitoring solution, that can be used in real life, while being able to tackle this challenge in a positive way. The SMILE data set was provided to team Anxolotl, and all it was needed was to develop a robust model. We developed a supervised learning model for classification in Python, presenting the final result of 64.1% in accuracy and a f1-score of 54.96%. The resulting solution stood the robustness test, presenting low variation between runs, which was a major point for it's possible integration in the Anxolotl app in the future.
translated by 谷歌翻译
The task of reconstructing 3D human motion has wideranging applications. The gold standard Motion capture (MoCap) systems are accurate but inaccessible to the general public due to their cost, hardware and space constraints. In contrast, monocular human mesh recovery (HMR) methods are much more accessible than MoCap as they take single-view videos as inputs. Replacing the multi-view Mo- Cap systems with a monocular HMR method would break the current barriers to collecting accurate 3D motion thus making exciting applications like motion analysis and motiondriven animation accessible to the general public. However, performance of existing HMR methods degrade when the video contains challenging and dynamic motion that is not in existing MoCap datasets used for training. This reduces its appeal as dynamic motion is frequently the target in 3D motion recovery in the aforementioned applications. Our study aims to bridge the gap between monocular HMR and multi-view MoCap systems by leveraging information shared across multiple video instances of the same action. We introduce the Neural Motion (NeMo) field. It is optimized to represent the underlying 3D motions across a set of videos of the same action. Empirically, we show that NeMo can recover 3D motion in sports using videos from the Penn Action dataset, where NeMo outperforms existing HMR methods in terms of 2D keypoint detection. To further validate NeMo using 3D metrics, we collected a small MoCap dataset mimicking actions in Penn Action,and show that NeMo achieves better 3D reconstruction compared to various baselines.
translated by 谷歌翻译
The field of Automatic Music Generation has seen significant progress thanks to the advent of Deep Learning. However, most of these results have been produced by unconditional models, which lack the ability to interact with their users, not allowing them to guide the generative process in meaningful and practical ways. Moreover, synthesizing music that remains coherent across longer timescales while still capturing the local aspects that make it sound ``realistic'' or ``human-like'' is still challenging. This is due to the large computational requirements needed to work with long sequences of data, and also to limitations imposed by the training schemes that are often employed. In this paper, we propose a generative model of symbolic music conditioned by data retrieved from human sentiment. The model is a Transformer-GAN trained with labels that correspond to different configurations of the valence and arousal dimensions that quantitatively represent human affective states. We try to tackle both of the problems above by employing an efficient linear version of Attention and using a Discriminator both as a tool to improve the overall quality of the generated music and its ability to follow the conditioning signals.
translated by 谷歌翻译